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Abstract —Exploiting interaction with the environment is a 
promising and powerful way to enhance stability of humanoid 
robots and robustness while executing locomotion and ma¬ 
nipulation tasks. Recently some works have started to show 
advances in this direction considering humanoid locomotion 
with multi-contacts, but to be able to fully develop such abilities 
in a more autonomous way, we need to first understand and 
classify the variety of possible poses a humanoid robot can 
achieve to balance. To this end, we propose the adaptation of 
a successful idea widely used in the field of robot grasping to 
the field of humanoid balance with multi-contacts: a whole- 
body pose taxonomy classifying the set of whole-body robot 
configurations that use the environment to enhance stability. We 
have revised criteria of classification used to develop grasping 
taxonomies, focusing on structuring and simplifying the large 
number of possible poses the human body can adopt. We 
propose a taxonomy with 46 poses, containing three main 
categories, considering number and type of supports as well 
as possible transitions between poses. 

The taxonomy induces a classification of motion primitives 
based on the pose used for support, and a set of rules to store 
and generate new motions. We present preliminary results that 
apply known segmentation techniques to motion data from the 
KIT whole-body motion database. Using motion capture data 
with multi-contacts, we can identify support poses providing 
a segmentation that can distinguish between locomotion and 
manipulation parts of an action. 

I. INTRODUCTION 

In the last decades, we have seen many successful results 
to make humanoid robots walk, stand and run maintaining 
balance [1], [2], [3]. However, such movements are still not 
robust enough for real life environments. While contacts 
with the arms can be used to achieve more stable poses and 
movements, the problem of generating whole-body motions 
with multi-contacts is very challenging and the traditional 
ZMP approach cannot be used. Several approaches exist, 
either using complex planning algorithms with constraints 
or using control methodologies to handle multi-contact situa¬ 
tions with impressive results [4], [5], [6], [7]. However, these 
solutions are still computationally very expensive and not 
able to perform autonomous locomotion tasks with multiple 
contacts. 

Recent results in simulation of walking in unstructured 
environments show a humanoid using walking staff [8]. 
Using the staff allows to use combinations of two, three and 
four support poses to obtain a more stable walking. However, 
the robot could gain stability in many different ways, not 
only using tools but also its own body. Understanding all 
the possible ways the body can be used for balancing can 
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Fig. 1. To perform the task of hiting the red object, several poses can be 
chosen to provide the stability that the task requires. The support poses are 
defined by the contacts highlighted in green. The numbers under the sketches 
refer to the id number of the support class according to our taxonomy in 
Table H| 


be crucial not only to provide more stable motions, but to 
generate alternative fall recovery strategies, complex motions 
with large contact areas such as crawling, or the generation 
of more autonomous locomotion and manipulation (loco- 
manipulation) actions (Fig. [I]). 

Our goal is to go a step further in the understanding of the 
possible range of poses that can provide stability to the robot. 
To do so, we have been inspired by grasping. The main tools 
to understand how the hand can hold an object are the grasp 
taxonomies [9], [10], [11], [12]. Grasp taxonomies have been 
proven to be useful in many contexts: to provide a benchmark 
to test the abilities of a new robotic hand, to simplify grasp 
synthesis, to guide autonomous grasp choices, or to inspire 
hand design, among others. 

When a humanoid uses its body to gain stability through 
contacts with its environment, the dynamic equations to 
achieve equilibrium are similar to those of grasping, where 
the body plays the role of the hand and the object is the 
environment. Let us refer to these body poses as whole- 
body grasps. In this work, we propose a classification of 
whole-body grasps exploring similar criteria as these used 
for taxonomies in grasping. Most grasping taxonomies define 
two main categories: precision and power grasping. In addi¬ 
tion, Cutkosky classifies grasps according to object shape and 
tasks [9]. Kamakura according to task and the hand areas of 
contact ([12], [11]) and Feix et al. according to number/type 
of contacts and the configuration of the thumb [10]. Indeed, 
there are many similar concepts. We can directly use the 
idea of precision vs. power grasping for whole-body grasps: 
poses that use contact with the torso vs. poses where contacts 
are realized only using the end-effectors. But also there is 
an important difference, almost all whole-body grasps are 
non-prehensile, i.e., grasps that use the gravity to hold the 
object. We will explore in detail how this affects the criteria 
that used to obtain a whole-body pose taxonomy. 














Whole-body motion is widely studied in many areas 
such as computer graphics, biomechanics, and robotics. An 
important aspect of motion analysis is how to characterize 
motions, using sketches [13] or smart selection of key frames 
[14], [15]. A proper characterization of a motion is crucial 
for efficient search in whole-body motions large databases, 
but also for movement segmentation and semantic interpre¬ 
tations of actions with applications to imitation learning and 
autonomous robotics [16], [17], [18]. In the field of computer 
graphic animations, there are works that can successfully 
plan a motion with multiple contacts and complex interaction 
with the environment [19], [20]. However, the autonomous 
generation of complex interactions with the environment for 
robotics is still an open problem due to the computation 
complexity of the planning, the robot lack of awareness of the 
available options and the uncertainty at the time of execution. 

While in classic locomotion actions such as walking and 
running the transitions between double and single support 
poses are very well understood [21], [22], such transitions 
can become much more complex when the possibility of 
leaning against a surface with the hands is considered. We 
are interested in identifying balance poses during motions to 
be able to understand the motion and segment it into motion 
primitives based on the support poses. Our idea is then 
to reproduce and generate new locomotions that combine 
motion primitives based on support poses, similarly as it is 
done for manipulation actions [18]. 

However, the human body is complex, and so, from the 
biomechanics point of view, a taxonomy of stable human 
body poses could become very complex, as some humans 
have outstanding capabilities not only for stable walking, but 
for standing on a tip-toe and perform all kind of acrobatics 
movements. In this work, we focus on robotics and our main 
priority will be to simplify the large number of possible poses 
in a way that can be useful for current humanoid robots, such 
as HRP-2, HRP-4 [23], ASIMO [24], TORO [25], ARMAR- 
4 [26], etc. Therefore, we do not intend to provide a complete 
taxonomy that covers all the possible configurations the 
human body affords. 

To the best of our knowledge, there is no classification 
of static humanoid support poses. This is probably due to 
the relative novelty of the problem. A full taxonomy of 
whole-body grasps can have many interesting applications 
and uses, such as a tool for autonomous decision making, a 
guide to design complex motions combining different whole- 
body grasps, a way to simplify the control complexity, a 
benchmark to test abilities for humanoid robots, and a way 
to improve recognition of body poses and transitions between 
them. 

This paper is organized as follows. Section [II] introduces 
the proposed taxonomy and discusses the criteria that have 
been taken into account. Section [III] defines a support pose 
and its relationship with motion and actions, proposing 
a motion analysis method to identify the locomotion and 
manipualtion parts of an action. An example of the proposed 
analysis is shown in Section IV Finally, Section [V] summa¬ 
rizes the presented work. 



tip-toe feet knee 

Fig. 2. Types of support contacts with arms and legs. 


II. TAXONOMY OF STABLE WHOLE-BODY POSES 

When considering the whole body interacting with the 
environment, there is a wide range of different postures that 
the robot can adopt. We are interested in those poses that 
use contacts for balancing. Then, the limb end-effectors that 
are not used for balancing can be used to perform other 
manipulation tasks. This way, we provide a framework for 
loco-manipulation poses. 

In Table [I] you can find our proposed taxonomy. It contains 
a total of 46 classes, divided into three main categories: 
standing, kneeling and resting. Each row corresponds to 
different number of supports, and in each row, different 
columns correspond to different contact types (see contact 
type legend at the bottom left corner or Table [I]). In addition, 
colors differentiate type of leg supports and poses under the 
gray area use line contacts (with arms or legs). The lines 
between boxes indicate possible pose transitions assuming 
only one change of support at a time. 

A. Number and type of contacts 

One of the first relevant characteristics that greatly mod¬ 
ifies the complexity of a motion is the number of supports 
with the environment. Kinematically, each support creates 
a new closed kinematic loop, and therefore, reduces by 
1 the dimension of the feasible configuration space [27], 
[28]. Dynamically, planning of complex motions tested on 
humanoid robots report higher execution times per higher 
number of supports [5]. Therefore, the number of supports 
with the environment is a clear first layer of classification, 
resulting in different rows for the taxonomy in Table [I] 

Contacts with environmental elements that do not provide 
support are not considered for the taxonomy classification. 
For instance, in Fig[l] green marked contacts define the 
support pose, while the rest are contacts intended to ma¬ 
nipulate the environment that do not affect at the support 
pose definition. 

From the control point of view, it is very relevant the 
nature of the contact used to provide the support [29], [30] 
and the part of the body that performs the support, because 
the resultant kinematics of the robot change accordingly. A 
fingertip contact is usually modeled as a point contact with 


















TABLE I 

Taxonomy of balancing whole-body poses 
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The proposed taxonomy has 46 classes, including 18 standing poses, 18 kneeling poses and 10 resting configurations. Sketches represent all the ranges of poses with the same number of supports and type 
of contact. Each class includes the symmetric cases when applicable. The lines provide possible road maps to transfer from one pose to another, assuming we can just perform one contact change at a time. 
Lines also provide a hierarchy among the poses. Blue lines represent transitions to different categories (from standing to kneeling). 




































































































































































friction, the foot support as plane contacts and arm leaning 
can be modeled using line with friction model [30]. Fig. [2] 
shows different possible types of support with the legs and 
arms. Considering all the types in the figure leads to 190 
possible combinations. Because we would like to keep our 
taxonomy simple, we have consider only 5 types: hold, palm, 
arm, feet, and knee support. These lead to the consideration 
of 51 combinations from which we have selected 36 (corre¬ 
sponding to the standing and kneeling poses). This choice has 
been done assuming that some combinations, while feasible, 
are not common. However, after further analysis of different 
motions is done, more classes may be included or excluded 
in the future. 

B. From precision to power whole-body grasps 

In addition to the standing and kneeling poses we have 
added 10 extra classes where there is contact with the torso. 
We have called them resting poses. Poses from r.l to r.4 
are poses where still balance needs to be achieved, but the 
inclination of the torso needs to be controlled. Poses from 
r.5 to r.6 are stable provided that the areas of contact are flat 
and with friction. Finally, using poses from r.7 to r.10 the 
robot is unlikely to lose balance and can be considered safe 
and completely in rest, but with very limited mobility. 

At this stage of work, no transitions are shown between 
resting poses and the rest of the table. Such transitions are 
more complex and require further motion analysis that will 
be left for future work. 

C. Shape of the environment 

Many grasping taxonomies include the shape of the object 
as a criteria for grasp choice. Indeed, object shape and 
size have a great influence on the ability for grasping and 
manipulation. 

However, there is a fundamental difference between hand 
grasping and whole-body grasping: the need of gravity to 
reach force closure. A hand grasp will always start with 
no contacts at all, and after grasping, it may or may not 
start a manipulation motion that can be maintaining constant 
contacts (in-Grasp Manipulation) or performing re-grasps 
[31]. On the contrary, a whole-body grasp is always part 
of a motion sequence of re-grasps that will always start with 
at least one contact with the environment (even if one of 



Fig. 3. Implementation of poses of the classes 4.1 and 4.10 for different 
environment shapes. 


the phases has no contact as in a running locomotion or 
jumping). 

For this reason, we believe that whole-body grasp choice 
will not depend as much on the shape of the room, but on 
the task/motion the pose occurs. For instance, given the two 
poses in Fig. [3j if the environment is only a floor and a 
wall, as in the left lower shape, the choice between pose 
4.1 or 4.10 will be mainly influenced by the motion/task. 
For instance, at the beginning of a fall recovery motion the 
choice will most likely be 4.10 intercalated with 3.13, while 
at the end, once the robot is already standing, the choice will 
most probably be 4.1. 

In addition, the number and nature of constraints the poses 
in a class have to maintain are all the same regardless of 
environment shape, just the directions of the normals on the 
contact points change. 

For these reasons, we consider poses with different envi¬ 
ronment shape to be the same class. However, environment 
shape has to be taken into account. Therefore, in Section 


III we introduce a definition for pose class that in each 


instantiation contains information about the specific shape 
of the environment using the normals at the contacts. 


D. Stability 

The taxonomy in Table [T] is organized so that the less 
stable poses, with less number of supports lie on the upper 
left side, while the most stable ones on the lower right 
side, assuming that the more number of contacts and the 
larger the surfaces of contact, the more stable the robot 
is. Works like [32] show that there is a trade-off between 
stability and maneuverability during a goal-directed whole- 
body movements. In the taxonomy, we observe a similar 
tradeoff with mobility vs. stability. 

However, inside any class it is possible to obtain different 
levels of stability depending on the support region [33] and 
the sum of the contact wrenches [3]. 


III. CLASSIFICATION OF ACTIONS AND 
MOTIONS 

The proposed taxonomy induces a formalization of whole- 
body motions/actions depending on the support poses. To 
formalize this in more detail, we first need to define how to 
instantiate a pose class and the relationship between poses 
and motions. 


A. Support pose class 

In this context, we first need to define a contact as 

C — {/, m, c, n} (1) 

where l is the link in contact, m is the model of the 
contact, c are the global coordinates of the contact location 
and n the normal direction of the surface of contact. The 
contact model defines the number and nature of constraints, 
which will be unilateral in the case of a plane or a line 
contact, or bilateral in the case of a hold support. Robots 
can autonomously obtain the information about the shape of 
the environment using advanced perception techniques. Our 










group is working on methods to extract geometric primitives 
of the environment, providing information about location and 
normal direction of possible contacts in the scene [34]. 

Then, the instantiation of a class of the taxonomy is 
defined by 

{id,p,C = {Ci,i = 1.. .m},J\f} (2) 

where id is an identification of the class in the taxonomy, p 
the location of the center of mass of the robot (CoM), C the 
set of m contacts and AT the set of neighbor classes. 

This way, for each class the robot can be represented as 
a simplified model that contains the CoM, contact locations 
and contact normal directions in a similar way as in the 
works of Lemerle group [35], [29]. The information that 
defines each class instantiation is sufficient to define the 
set of constraints and equilibrium conditions that need to 
be satisfied [33], and to design the corresponding controllers 
to allow motions inside the class. 

B. Pose transitions and motions 

A transition between two classes can happen by first 
imposing the constraints of the current and destination class, 
and then shifting to only the constraints of the destination 
class. This induces the definition of two types of motions 

1) Inside class motion: A purely manipulation action 
will happen inside a single class. It includes other 
manipulation motions and therefore, extra contacts 
with objects, always with the objective of manipula¬ 
tion. As a manipulation motion, it can be semantically 
segmented and interpreted as done in [16]. 

2) Transition class motion: motions that define a tran¬ 
sition between poses. The motion still occurs inside a 
class, but the motion consists in the shifting towards a 
destination class, as part of a locomotion. For instance, 
a double feet support motion that shifts towards a right 
foot support (2.3 —^ 1.1). 

Note that both motions happen always inside the same 
support class, but in the second case, the destination class is 
relevant for the motion definition. 

In this context, the taxonomy provides a classification that 
allows to store previously computed motions associated to 
support poses. Transition class motions can be simple enough 
to convert them to motion primitives. Inside class motions 
may need further segmentation. Next, we go a step further 
to associate motion to tasks or actions. 

C. Tasks and actions 

Previous works on manipulation tasks such as [36] defined 
an action as an interaction between a hand and an object to 
induce a change at the object. In addition, they defined action 
components as those time-points where contact relations 
between hand and objects change. With these simple rules, 
they can classify all single hand manipulations in only 6 
categories. 

Following a similar approach, we can also define a whole- 
body action as an interaction between the body and the 
environment, but with two possible objectives: to induce a 




Fig. 4. Implementation of poses of the class 4.1 and 3.1, for a type III 
manipulation where all contacts are used both for balance and to change 
the environment. 

change in the environment or in the body itself. In the latter, 
we refer to whole-body actions where the main objective is 
to relocate (locomotion) or to gain stability (balance), not 
to change the environment. Action components can also be 
defined as those time-points where the number of contacts 
change. 

Using these definitions, we can define three main cate¬ 
gories of actions associated to the motions we defined before: 

• (I) actions that are intended to change the environment 
(that will be done using inside class motions) 

• (II) actions that are intended to change the body/robot 
(using transition class motions) and 

• (III) a combination of the above: actions where supports 
are used both to balance and to change the environment. 

Fig. [T] shows an example of the action of type (I) called 
in [36] hit object. The task requires a single end-effector 
to hit an object. Then, according to the requirements of the 
object (location, size or weight), we could decide between 1, 
2 or 3 support contacts to define the support pose where the 
manipulation can take place. In Fig. [T] we show 4 possible 
support poses: the standing pose 2.3, or pose 3.4 using an 
extra support on a table to be able to reach further, or a pose 
where the manipulation is done with the foot and the support 
is provided using the hands as in poses 3.1 or 2.1. 

Special cases are actions of type (III), like the example in 
Fig. |4j Here, the four contacts are used both for support and 
to modify the environment. We can identify such actions 
because they start as manipulation actions (inside class 
motions) but they occur during several pose transitions, and 
therefore, constitute a combined locomotion and manipula¬ 
tion action. 

This framework induces a set of segmentation criteria for a 
given motion that, provided that we can differentiate support 
contacts and manipulation contacts, subdivides a motion into 
pieces that can be related with types of actions. For actions 
identified as manipulation (type I), further segmentation 
based on the manipulation contacts can be performed [16], 
providing a hierarchy of segments distinguishing between the 
locomotion and the manipulation parts of an action. 

The lines between boxes in the taxonomy represent possi¬ 
ble transitions of only one support change. However, we want 
to analyze support pose transitions during loco-manipulation 
motions in humans to improve the proposed taxonomy and 
validate the proposed transitions. In addition, the motion 
analysis can provide a better semantic understanding of 
complex locomotion and manipulation actions for imitation 












Fig. 5. Speed of the considered end-effectors during the motion. Grid 
lines on the X axes show the segmentation frames detected where there 
was a change on support contacts. Grid lines on the Y axes show the speed 
threshold considered. 

learning and autonomous decision making applications. 

To do so, we propose to use the publicly available 
KIT whole-body human motion database [37], [38]. The 
database contains many motion capture data not only of 
human motions, but also from objects that are being used 
or manipulated during the motion. Including environmental 
elements in the motion capture data is the key aspect that 
allows us to use this database to analyze loco-manipulation 
actions. In addition, the database provides models of the 
objects and a normalized subject-independent representation 
of the motion capture data, based on a framework called 
Master Motor Map (MMM) [39]. The idea behind the MMM 
framework is to offer a unifying reference model of the 
human body with kinematic and dynamic parameters. This 
allows us to analyze the position data of different segments 
of the body, and to detect collision with the objects of the 
environment. In addition, MMM motions can be converted to 
other robots with different kinematic and dynamic structures, 
offer the possibility to transfer the pose transition motions 
to a humanoid robotic platform. 

The atoms of motion, segmented using the proposed anal¬ 
ysis, can be use to define Dynamic Motion Primitives [40]. 
The taxonomy can help to define a formal grammar to build 
new motions to provide robots with autonomous techniques 
to generate possible motions for a given environment. 

In the next section we show an example of the proposed 
motion analysis to one of the motions in the database. The 
analysis allows us to visualize the motion as a subgraph of 
the taxonomy. 

IV. EXAMPLE 

The analyzed motion consists of going up a set of stairs 
with a handle at the right hand side, lasting 5.5s with 100 
frames/s We combine a segmentation based on collision 
detection to detect contacts, that has been used in previous 
works to semantically interpret manipulation actions [16] 
with an analysis of the velocities of the extremities that 
can be in contact. A support contact is characterized by 

! Raw data files can be found in upstairs05 motion files in https ://motion- 
database.humanoids.kit.edu/details/motions/383/ 


having zero 6-dim velocity [41], however, the motion capture 
systems are noisy and they cannot capture the exact contact 
point location, and therefore, the velocities of the support 
segment are only close to zero. We have simplified to 
consider only the module of the positional velocity. The 
velocities are obtained by numerical differentiation of the po¬ 
sition data corresponding to the four end-effectors considered 
for possible supports: the two feet and the two hands. After 
inspecting the frequency content of these signals, we have 
low-pass filtered the position data at a cut-off frequency of 
1.5Hz to reduce the effect of noise. The resulting computed 
velocities are plotted in Fig. [5] for each of the end-effectors. 

Despite the noise in the data, we can identify different 
hypotheses of support using a speed threshold of 0.15m/s. 
Such hypotheses are then validated against the contact seg¬ 
mentation that is computed based on collision detection 
between the objects of the scene (the stairs and the floor) 
[16]. We discard the hypothesis of support if the extremity 
is not in contact with any object of the environment. The 
velocity information is considered more relevant in this case 
than the collision, because often the hand is very close to 
the handle but slides on it, without providing a real support. 

In Fig. [6] the resulting segments are represented showing 
the middle frame of each segment, and they are labeled 
according to the detected supports (RF and LF stand for 
left and right foot, while RH and LH stand for right/left 
hand). We also show the length of each segment as number 
of frames. 

Using the taxonomy of whole-body poses, we can analyze 
further the obtained segmentation by showing the number 
of pose transitions occurring during the motion as a graph 
(Fig. Each arrow shows the direction between origin 
and destination pose, and are labeled according to the order 
they occur. This offers a visual representation that allows 
us to rapidly observe that the motion starts and finishes in 
the double feet support and that the single support pose is 
visited 4 times, 2 for each foot (and so, the motion contains 4 
steps). We can also observe that the cycles are not symmetric, 
because the handle is on the right side of the body. Therefore 
the support on the handle occur mostly during the right foot 
support, while during the left foot support both the right 
foot and the right hand are swung towards the next support 
location. The left foot with hand support occurs at the end 
of the motion, when both feet go to the same step to finalize 
the walking. 

It is also worth noticing that some of the segments are 
very short (5 to 2 frames). This implies that in practice, 
more than one change of support occurs almost immediately. 
However, we believe that for robotic movements, is safer to 
always take into account this intermediate short poses. In 
walking motions, the double support phase is always much 
shorter than the single support, and the faster the motion, the 
shorter the double support. In this case, we can also say that 
beside the initial and final segment, the double foot support 
with or without hand have much shorter segment duration 
than phases with single foot support (with or without hand 
support). 

























Fig. 6. Result of the motion segmentation. We show the middle frame of each segment, with a label that indicates the order, the extremities in contact 
and the length of the segment in number of frames. RF and LF correspond to right and left foot, while RH and LH right and left hand. 



Fig. 7. Graph analysis of the pose transition occurring in the analyzed 
motion. Blue edges correspond to transitions to or from left foots, and red 
edges transitions to or from right foots. The edge labels indicate the order 
of the transition, in accordance to the labeling in Fig. [6] 

As a future work, we plan to record new motions involving 
more poses of the taxonomy, and combining actions of 
locomotion and manipulation. Analyzing the motion data 
will allow us to validate the suggested taxonomy, propose 
different routings of transitions, and modify the taxonomy 
accordingly if necessary. 

V. CONCLUSIONS 

This work, for the first time, has proposed a taxonomy of 
whole-body balancing poses containing 46 classes, divided 
into three main categories, considering number and type of 


support and possible transitions between support poses. We 
have analyzed known grasping criteria used to classify robot 
grasps, but focusing on the demands of whole-body poses. 
As opposed to grasping, we have given less relevance to 
environment shape and more to the type of contact the body 
uses to provide a support pose. 

We have proposed a formal definition to characterize a 
pose, as well as a characterization of motions according to 
the class pose where they take place. Analyzing an example 
of whole-body motion of going upstairs, we have shown a 
segmentation technique that allows us to identify several 
poses of the taxonomy, segment the motion according to 
the identified poses and provide a graph representation of 
a motion as transitions between such poses. 

We believe the proposed taxonomy has a lot of potential to 
be used in many areas of humanoid robotics. In future work, 
we plan to refine our classification using further motion data, 
store classified motions according to pose and convert and 
adapt such motions to humanoid robotic platforms that use 
their environment for extra support. Nonetheless, the range 
of applications is certainly much wider and we expect to 
contribute in many aspects of humanoid robotics research in 
the upcomming years. 
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